Identify genomic component associated with environmental parameters of grow-out ponds

Comparison of reproductive success in red drum hatchery adults

SJ O’Leary

2020-07-15

1 This is a sidenote that was entered using a footnote.

Redundancy analysis is a constrained ordination method that can be used to determine the extent to which one set of variables (explanatory variables) explains the variation in another set of variables (response variables), i.e. it is the multivariate analog of a simple linear regression (Oksanen et al. 2010). It assumes that the relationship between the dependent (response) and independent (explanatory) variables is linear. Here, it can be applied to understand which environmental parameters of the grow-out pond significantly explain a component of the observed pattern of genetic variation of progeny (Forester et al. 2016, 2018).

A partial RDA enables examining the effect of a set of variables (explanatory variables, X) on a response matrix (Y) while controlling for a set of conditioning variables (Z) and can be applied in a case such as here where the strong effect of a set of well-characterized variables (parentage/family) might obscure the signal of a second set of variables (environmental conditions).

Prepare data sets

Response variables (genotypes)

Thin SNPs so only one SNP per contig is retained to reduce linkage among markers.


# thin SNPs within 350 bp of each other
vcftools --vcf data/GENOME_SCAN/SOC.yoy.genotyped.recode.vcf --out data/GENOME_SCAN/SOC.yoy.thinned --thin 350 --recode --recode-INFO-all

Format response variables (genetic data) as allele counts per locus and individual, homozygous genotypes are coded as 0 or 2, heterozygotes as 1.

RDA requires a complete data set, therefore missing values are replaced with the mean allele frequency.

The response variables consist of 2023 loci genotyped for 828 progeny across three spawning events (three sample points per spawning event).

Conditioning variables: Family

Parentage (dam + sire) of each offspring creates the dominant signal in the data set. Because categorical data cannot be used, parentage will be converted into “dummy variables”, i.e. we will create a matrix with each row corresponding to a single progeny and on column per potential parent. Parents will be coded as 1 non-parental adults as 0.

Explanatory variables: Environmental data of grow-out ponds

The spawning events took place in 2017 (N = 1) and 2018 (N = 2). Grow-out ponds exclude predators and ensure abundant prey and while physical parameters (temperature, dissolved oxygen, pH, and salinity) are monitored, they are not controlled with the exception of dissolved oxygen which is maintained by an active water paddle which can be manipulated.

Table 1: Number of individuals at different stages (incubator, stocking into grow-out ponds, harvest) and comparison of survival in incubator and grow-out pond.

EVENT EGGS_INCUBATED FRY_HARVESTED FRY_STOCKED YOY_HARVESTED PROP_SURV_INCUB PROP_SURV_POND
YOY-1 1050000 625628 395254 111306 0.60 0.28
YOY-2 690000 375560 375560 343508 0.54 0.91
YOY-3 1510000 784554 428230 30099 0.52 0.07

The number of eggs placed in the incubator is largely consistent across spawning events 1 and 3 (approx. 1 Mill). By contrast, for spawning event 2 only approximate 690,000 were placed in the incubator; despite though the number of brood fish involved in the spawning event 2 being the highest (individuals from 4 tanks compared to 2 and 3).

Though fry were stocked into the grow-out ponds at similar levels (375,000 - 428,000) the proprtion of progeny that survived to be harvested for stocking varied greatly from 7% (spawning event 3), to 28% spawning event 1, and 91% (spawning event 2). By contrast, survival during the time in the incubator was largely consistent (52 - 60%).

Despite spawning event 2 having the lowest number of eggs incubated, due to the highest proportion of progeny surviving to the next stage both in the incubator and the grow-out pond it had the highest number of progeny harvested for stocking at 343,000 compared to 111,000 and 30,000 for spawning event 1 and 3, respectively. This could be due to differences in environmental conditions of the pond or due to the fact that this pond was harvested early due to construction.

Fig 1: Change in length [mm] of progreny over time spend in grow-out pond for each spawning event.

Fig 1: Change in length [mm] of progreny over time spend in grow-out pond for each spawning event.
## 
##  linear regression growth YOY-1
## 
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
## 
## Residuals:
##       1       2       3       4       5 
##  1.0008 -0.4809 -0.8673 -0.4354  0.7828 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  6.55836    1.48335   4.421   0.0215 *
## DAY          0.29545    0.06571   4.496   0.0205 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9639 on 3 degrees of freedom
## Multiple R-squared:  0.8708, Adjusted R-squared:  0.8277 
## F-statistic: 20.22 on 1 and 3 DF,  p-value: 0.02054
## slope (growth) for spawning event 1: 0.3
## 
##  linear regression growth YOY-2
## 
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
## 
## Residuals:
##       1       2       3       4       5       6       7 
## -2.9313  0.3588 -0.2878  4.5290  0.4725  1.2160 -3.3573 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  13.2183     3.1577   4.186   0.0086 **
## DAY           0.3366     0.1074   3.135   0.0258 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.908 on 5 degrees of freedom
## Multiple R-squared:  0.6629, Adjusted R-squared:  0.5954 
## F-statistic: 9.831 on 1 and 5 DF,  p-value: 0.0258
## slope (growth) for spawning event 2: 0.34
## 
##  linear regression growth YOY-3
## 
## Call:
## lm(formula = LENGTH ~ DAY, data = tmp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9572 -0.9204  0.3286  0.9082  1.9897 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.11013    1.21904   5.833 0.000642 ***
## DAY          0.10409    0.03448   3.019 0.019410 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.48 on 7 degrees of freedom
## Multiple R-squared:  0.5656, Adjusted R-squared:  0.5036 
## F-statistic: 9.115 on 1 and 7 DF,  p-value: 0.01941
## slope (growth) for spawning event 3: 0.1

Linear regressions show a similar slope for spawning event 1 and 2 (0.3 and 0.34), while the slope for spawning event three is lower (0.1); indicating slowest growth. Generally, fish are expected to spend approximately 30 days in a pond and be about 3 cm when harvested, though growth rates may differ among ponds, especially due to differences in temperature.

Fish were harvested from grow-out pond 2 at a shorter size (approx 1.5 cm) than usually desired (2.5 - 3 cm) due to construction. Due to slow, growth fish from pond 3 were also harvested at a smaller age and spent the most amount of time in the grow-out ponds (> 50).

Fig 2a: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 1. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Fig 2a: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 1. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Pond culture for spawning event 3 is characterized by a decrease in temperature between sampling point T1 and T2 which then increases again to experience a second more steep (but not as drastic) decline.

Fig 2b: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 2. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Fig 2b: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 2. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Spawning event 2 has comparatively lower salinity overall and experiences a steep decline in temperature after sampling point T2 after which there is a slow, steady increase in temperature.

Fig 2c: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 3. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Fig 2c: Change of physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond during culture for spawning event 3. Dashed vertical lines indicate sample points T1, T2, and T3. Dissolved oxygen and temperature are measures three times a day, in the morning (blue), afternoon (red), and evening (green).

Spawning event 3 has similarly low levels of salinity compared to spawning event 2. This pond was stocked about a week after the pond used for spawning event 2, so it experiences the same steep decline in temperature follow by a gradualt increase but then experiences a second similarly steep but even more drastic drop in temperatures after which the temperatures remain low.

Fig 3: Correlation among physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond (example spawning event 1).

Fig 3: Correlation among physical parameters (temperature, dissolved oxygen, pH, and salinity) measured in each grow-out pond (example spawning event 1).

Morning, evening, and afternoon measurements of temperature are correlated. The same pattern is observed for dissolved oxygen.

Fig 4: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).

Fig 4: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).

Not all sampled progeny experienced the entire range of values, i.e. they are removed at different time points. Calculate mean values for three time ranges (day progeny are stocked to time points 1 - 3).

Fig 5: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).

Fig 5: Comparison of distribution of physical parameters in grow-out ponds of spawning events 1 (blue), 2 (green), and 3 (red)).

Test for significant differences among physical parameters of grow-out ponds for each time range per spawning event.

Temperature (morning)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value             Pr(>F)    
## RANGE         8   1562  195.29   12.09 0.0000000000000185 ***
## Residuals   237   3829   16.16                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2a: Significant pairwise comparisons of morning temperature using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-2_1 YOY-1_2 4.819028 0.027
YOY-2_1 YOY-1_3 5.508352 0.001
YOY-2_2 YOY-1_2 4.542105 0.010
YOY-2_2 YOY-1_3 5.231429 0.000
YOY-2_3 YOY-2_2 -3.470213 0.023
YOY-3_1 YOY-1_3 4.981429 0.008
YOY-3_3 YOY-2_1 -7.535653 0.000
YOY-3_3 YOY-2_2 -7.258730 0.000
YOY-3_3 YOY-2_3 -3.788517 0.000
YOY-3_3 YOY-3_1 -7.008730 0.000
YOY-3_3 YOY-3_2 -4.583730 0.000

Temperature (afternoon)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value            Pr(>F)    
## RANGE         8   1772  221.51   11.27 0.000000000000161 ***
## Residuals   239   4698   19.66                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2b: Significant pairwise comparisons of afternoon temperature using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-2_1 YOY-1_2 5.049615 0.041
YOY-2_1 YOY-1_3 5.404060 0.006
YOY-2_2 YOY-1_2 4.651956 0.020
YOY-2_2 YOY-1_3 5.006401 0.001
YOY-2_3 YOY-2_2 -3.582701 0.044
YOY-3_1 YOY-1_3 5.152778 0.017
YOY-3_3 YOY-2_1 -8.171917 0.000
YOY-3_3 YOY-2_2 -7.774258 0.000
YOY-3_3 YOY-2_3 -4.191557 0.000
YOY-3_3 YOY-3_1 -7.920635 0.000
YOY-3_3 YOY-3_2 -4.087302 0.005

Temperature (evening)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value             Pr(>F)    
## RANGE         8   1616  201.98   11.65 0.0000000000000578 ***
## Residuals   239   4144   17.34                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2c: Significant pairwise comparisons of evening temperature using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-2_1 YOY-1_2 5.094 0.020
YOY-2_1 YOY-1_3 5.653 0.001
YOY-2_2 YOY-1_2 4.299 0.024
YOY-2_2 YOY-1_3 4.857 0.001
YOY-2_3 YOY-2_1 -4.220 0.037
YOY-2_3 YOY-2_2 -3.425 0.037
YOY-3_1 YOY-1_3 5.242 0.006
YOY-3_3 YOY-2_1 -7.950 0.000
YOY-3_3 YOY-2_2 -7.155 0.000
YOY-3_3 YOY-2_3 -3.730 0.000
YOY-3_3 YOY-3_1 -7.539 0.000
YOY-3_3 YOY-3_2 -4.564 0.000

Dissolved Oxygen (morning)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## RANGE         8   40.1   5.017   3.118 0.00227 **
## Residuals   239  384.5   1.609                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2d: Significant pairwise comparisons of dissolved oxygen in the morning using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-3_3 YOY-1_3 0.974 0.009
YOY-3_3 YOY-2_2 1.101 0.013

Dissolved Oxygen (afternoon)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## RANGE         8   60.0   7.503   4.524 0.0000387 ***
## Residuals   239  396.4   1.658                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2e: Significant pairwise comparisons of dissolved oxygen in the afternoon using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-2_1 YOY-1_1 -2.092 0.005
YOY-2_1 YOY-1_2 -1.587 0.018
YOY-2_2 YOY-1_1 -1.648 0.024
YOY-3_3 YOY-2_1 1.638 0.001
YOY-3_3 YOY-2_2 1.194 0.006

Dissolved Oxygen (evening)

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value Pr(>F)  
## RANGE         8   36.0   4.494   2.284 0.0226 *
## Residuals   239  470.3   1.968                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2f: Significant pairwise comparisons of dissolved oxygen in the evening using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-3_3 YOY-1_3 0.981 0.026

Salinity

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value              Pr(>F)    
## RANGE         8  13846  1730.7    2211 <0.0000000000000002 ***
## Residuals   239    187     0.8                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2g: Significant pairwise comparisons of salinity using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-1_3 YOY-1_1 1.644 0
YOY-1_3 YOY-1_2 1.294 0
YOY-2_1 YOY-1_1 -17.377 0
YOY-2_1 YOY-1_2 -17.727 0
YOY-2_1 YOY-1_3 -19.021 0
YOY-2_2 YOY-1_1 -17.604 0
YOY-2_2 YOY-1_2 -17.954 0
YOY-2_2 YOY-1_3 -19.249 0
YOY-2_3 YOY-1_1 -17.343 0
YOY-2_3 YOY-1_2 -17.693 0
YOY-2_3 YOY-1_3 -18.987 0
YOY-3_1 YOY-1_1 -14.550 0
YOY-3_1 YOY-1_2 -14.900 0
YOY-3_1 YOY-1_3 -16.194 0
YOY-3_1 YOY-2_1 2.827 0
YOY-3_1 YOY-2_2 3.054 0
YOY-3_1 YOY-2_3 2.793 0
YOY-3_2 YOY-1_1 -14.258 0
YOY-3_2 YOY-1_2 -14.608 0
YOY-3_2 YOY-1_3 -15.903 0
YOY-3_2 YOY-2_1 3.119 0
YOY-3_2 YOY-2_2 3.346 0
YOY-3_2 YOY-2_3 3.084 0
YOY-3_3 YOY-1_1 -13.760 0
YOY-3_3 YOY-1_2 -14.110 0
YOY-3_3 YOY-1_3 -15.405 0
YOY-3_3 YOY-2_1 3.617 0
YOY-3_3 YOY-2_2 3.844 0
YOY-3_3 YOY-2_3 3.582 0

pH

## 
##  RESULTS ANOVA
##              Df Sum Sq Mean Sq F value              Pr(>F)    
## RANGE         8  3.880  0.4850   19.52 <0.0000000000000002 ***
## Residuals   239  5.939  0.0249                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Global comparison is significant, test for pairwise comparisons among time ranges across spawning events.

Table 2h: Significant pairwise comparisons of PH using Tukey-Honest Significanct Differences test.

RANGE1 RANGE2 diff p adj
YOY-2_3 YOY-1_1 -0.187 0.022
YOY-2_3 YOY-1_2 -0.242 0.000
YOY-2_3 YOY-1_3 -0.149 0.001
YOY-3_1 YOY-1_2 -0.182 0.046
YOY-3_2 YOY-1_1 -0.218 0.009
YOY-3_2 YOY-1_2 -0.273 0.000
YOY-3_2 YOY-1_3 -0.181 0.001
YOY-3_3 YOY-1_1 -0.332 0.000
YOY-3_3 YOY-1_2 -0.387 0.000
YOY-3_3 YOY-1_3 -0.294 0.000
YOY-3_3 YOY-2_1 -0.222 0.000
YOY-3_3 YOY-2_2 -0.248 0.000
YOY-3_3 YOY-2_3 -0.146 0.000
YOY-3_3 YOY-3_1 -0.206 0.002

Calculate mean, median, 5th and 95th percentile for salinity, pH, temperature, and dissolved oxygen measurements as potential constraining (explanatory) variables for RDA.

Variable selection

Run step-wise selection of variables using p(in)-value < 0.15 and p(out)-value > 0.1; run forward selection of variable using R2 value and Pin < 0.1 to identify best model(s).

## 
##  
## STEPWISE SELECTION OF VARIABLES (P(in) = 0.15/P(out) = 0.1
## 
## Start: allelecount ~ 1 
## 
##                      Df    AIC       F Pr(>F)   
## + SALINITY_qnt_5      1 4828.7 42.6718  0.005 **
## + SALINITY_MEAN       1 4829.1 42.2160  0.005 **
## + SALINITY_qnt_95     1 4829.2 42.1228  0.005 **
## + SALINITY_qnt_50     1 4829.5 41.7960  0.005 **
## + PH_qnt_95           1 4830.2 41.0988  0.005 **
## + TEMP_AFTERN_qnt_95  1 4830.3 41.0247  0.005 **
## + PH_MEAN             1 4833.4 37.7687  0.005 **
## + PH_qnt_50           1 4840.3 30.5942  0.005 **
## + PH_qnt_5            1 4845.5 25.2515  0.005 **
## + DO_EVEN_MEAN        1 4855.3 15.1787  0.005 **
## + TEMP_AFTERN_qnt_50  1 4859.1 11.3590  0.005 **
## + TEMP_AFTERN_MEAN    1 4863.2  7.2169  0.005 **
## + TEMP_AFTERN_qnt_5   1 4865.8  4.6397  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 
## 
##                  Df    AIC      F Pr(>F)   
## - SALINITY_qnt_5  1 4868.4 42.672  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC       F Pr(>F)   
## + PH_qnt_95           1 4814.9 15.9024  0.005 **
## + DO_EVEN_MEAN        1 4820.4 10.3716  0.005 **
## + PH_MEAN             1 4820.6 10.0927  0.005 **
## + PH_qnt_5            1 4823.3  7.4311  0.005 **
## + PH_qnt_50           1 4824.0  6.7116  0.005 **
## + TEMP_AFTERN_MEAN    1 4825.4  5.2898  0.005 **
## + TEMP_AFTERN_qnt_5   1 4825.5  5.1930  0.005 **
## + SALINITY_qnt_50     1 4825.7  5.0437  0.005 **
## + TEMP_AFTERN_qnt_95  1 4826.7  3.9652  0.005 **
## + TEMP_AFTERN_qnt_50  1 4826.9  3.7930  0.005 **
## + SALINITY_MEAN       1 4828.3  2.4122  0.005 **
## + SALINITY_qnt_95     1 4828.7  1.9813  0.010 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 
## 
##                  Df    AIC      F Pr(>F)   
## - PH_qnt_95       1 4828.7 15.902  0.005 **
## - SALINITY_qnt_5  1 4830.2 17.428  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)   
## + SALINITY_qnt_95     1 4812.6 4.2985  0.005 **
## + TEMP_AFTERN_MEAN    1 4813.4 3.4734  0.005 **
## + SALINITY_MEAN       1 4813.4 3.4320  0.005 **
## + PH_qnt_5            1 4813.5 3.3943  0.005 **
## + TEMP_AFTERN_qnt_5   1 4813.6 3.3283  0.005 **
## + TEMP_AFTERN_qnt_50  1 4813.9 2.9706  0.005 **
## + DO_EVEN_MEAN        1 4814.0 2.8568  0.005 **
## + PH_MEAN             1 4814.1 2.7575  0.005 **
## + SALINITY_qnt_50     1 4814.1 2.7505  0.005 **
## + TEMP_AFTERN_qnt_95  1 4814.3 2.5888  0.005 **
## + PH_qnt_50           1 4814.5 2.4242  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 
## 
##                   Df    AIC       F Pr(>F)   
## - SALINITY_qnt_95  1 4814.9  4.2985  0.005 **
## - SALINITY_qnt_5   1 4815.3  4.7336  0.005 **
## - PH_qnt_95        1 4828.7 18.2418  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)   
## + TEMP_AFTERN_MEAN    1 4810.1 4.4276  0.005 **
## + TEMP_AFTERN_qnt_5   1 4810.6 3.9371  0.005 **
## + PH_qnt_50           1 4810.8 3.7533  0.005 **
## + PH_MEAN             1 4811.1 3.4254  0.005 **
## + TEMP_AFTERN_qnt_50  1 4811.1 3.4189  0.005 **
## + PH_qnt_5            1 4811.3 3.2216  0.005 **
## + TEMP_AFTERN_qnt_95  1 4811.4 3.1220  0.005 **
## + DO_EVEN_MEAN        1 4811.6 2.9571  0.005 **
## + SALINITY_MEAN       1 4811.9 2.6432  0.005 **
## + SALINITY_qnt_50     1 4813.2 1.4176  0.060 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN 
## 
##                    Df    AIC       F Pr(>F)   
## - TEMP_AFTERN_MEAN  1 4812.6  4.4276  0.005 **
## - SALINITY_qnt_95   1 4813.4  5.2526  0.005 **
## - SALINITY_qnt_5    1 4813.7  5.5064  0.005 **
## - PH_qnt_95         1 4825.3 17.2865  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)   
## + DO_EVEN_MEAN        1 4805.2 6.8815  0.005 **
## + PH_qnt_5            1 4808.0 4.1348  0.005 **
## + SALINITY_qnt_50     1 4809.4 2.6836  0.005 **
## + PH_MEAN             1 4810.7 1.4787  0.030 * 
## + PH_qnt_50           1 4810.8 1.3144  0.045 * 
## + SALINITY_MEAN       1 4810.9 1.2001  0.140   
## + TEMP_AFTERN_qnt_5   1 4811.0 1.1569  0.175   
## + TEMP_AFTERN_qnt_50  1 4811.1 1.0260  0.380   
## + TEMP_AFTERN_qnt_95  1 4811.1 0.9992  0.445   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN 
## 
##                    Df    AIC       F Pr(>F)   
## - DO_EVEN_MEAN      1 4810.1  6.8815  0.005 **
## - TEMP_AFTERN_MEAN  1 4811.6  8.3572  0.005 **
## - PH_qnt_95         1 4813.1  9.8634  0.005 **
## - SALINITY_qnt_5    1 4813.1  9.8740  0.005 **
## - SALINITY_qnt_95   1 4813.3 10.0301  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)   
## + PH_qnt_5            1 4804.7 2.4785  0.005 **
## + PH_MEAN             1 4805.0 2.2333  0.005 **
## + TEMP_AFTERN_qnt_5   1 4805.4 1.8285  0.010 **
## + PH_qnt_50           1 4805.3 1.9241  0.015 * 
## + SALINITY_qnt_50     1 4805.7 1.5258  0.025 * 
## + TEMP_AFTERN_qnt_50  1 4805.8 1.4677  0.025 * 
## + SALINITY_MEAN       1 4805.9 1.2993  0.070 . 
## + TEMP_AFTERN_qnt_95  1 4806.0 1.1833  0.175   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 
## 
##                    Df    AIC      F Pr(>F)   
## - SALINITY_qnt_5    1 4804.7 1.9499  0.010 **
## - SALINITY_qnt_95   1 4804.8 2.0395  0.010 **
## - PH_qnt_5          1 4805.2 2.4785  0.005 **
## - PH_qnt_95         1 4807.5 4.7041  0.005 **
## - DO_EVEN_MEAN      1 4808.0 5.2164  0.005 **
## - TEMP_AFTERN_MEAN  1 4809.8 7.0638  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)  
## + PH_MEAN             1 4805.4 1.3144  0.060 .
## + PH_qnt_50           1 4805.4 1.3119  0.060 .
## + SALINITY_qnt_50     1 4805.4 1.3199  0.080 .
## + SALINITY_MEAN       1 4805.4 1.2883  0.105  
## + TEMP_AFTERN_qnt_95  1 4805.6 1.1220  0.190  
## + TEMP_AFTERN_qnt_5   1 4805.6 1.1511  0.195  
## + TEMP_AFTERN_qnt_50  1 4805.6 1.1050  0.205  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + PH_MEAN 
## 
##                    Df    AIC      F Pr(>F)   
## - PH_MEAN           1 4804.7 1.3144  0.125   
## - SALINITY_qnt_95   1 4804.7 1.3016  0.080 . 
## - SALINITY_qnt_5    1 4804.8 1.4085  0.035 * 
## - PH_qnt_5          1 4805.0 1.5591  0.035 * 
## - DO_EVEN_MEAN      1 4805.1 1.6528  0.020 * 
## - PH_qnt_95         1 4806.0 2.5362  0.005 **
## - TEMP_AFTERN_MEAN  1 4810.5 7.0965  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 
## 
##                      Df    AIC      F Pr(>F)  
## + PH_qnt_50           1 4805.4 1.3119  0.070 .
## + SALINITY_qnt_50     1 4805.4 1.3199  0.085 .
## + PH_MEAN             1 4805.4 1.3144  0.100 .
## + SALINITY_MEAN       1 4805.4 1.2883  0.110  
## + TEMP_AFTERN_qnt_5   1 4805.6 1.1511  0.155  
## + TEMP_AFTERN_qnt_50  1 4805.6 1.1050  0.220  
## + TEMP_AFTERN_qnt_95  1 4805.6 1.1220  0.260  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + PH_qnt_50 
## 
##                    Df    AIC      F Pr(>F)   
## - PH_qnt_50         1 4804.7 1.3119  0.095 . 
## - PH_qnt_5          1 4805.3 1.8652  0.020 * 
## - SALINITY_qnt_95   1 4805.1 1.6343  0.015 * 
## - SALINITY_qnt_5    1 4805.2 1.7276  0.015 * 
## - DO_EVEN_MEAN      1 4805.7 2.2419  0.005 **
## - PH_qnt_95         1 4808.1 4.6475  0.005 **
## - TEMP_AFTERN_MEAN  1 4810.5 7.0067  0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                      Df    AIC      F Pr(>F)
## + SALINITY_MEAN       1 4806.3 1.1423  0.175
## + TEMP_AFTERN_qnt_50  1 4806.3 1.1423  0.190
## + PH_MEAN             1 4806.3 1.1423  0.200
## + TEMP_AFTERN_qnt_95  1 4806.3 1.1423  0.225
## + TEMP_AFTERN_qnt_5   1 4806.3 1.1423  0.235
## + SALINITY_qnt_50     1 4806.3 1.1423  0.255
## 
##  
## FORWARD SELECTION OF VARIABLES USING R2 value.
## Step: R2.adj= 0 
## Call: allelecount ~ 1 
##  
##                      R2.adjusted
## <All variables>      0.081170092
## + SALINITY_qnt_5     0.047971864
## + SALINITY_MEAN      0.047472101
## + SALINITY_qnt_95    0.047369779
## + SALINITY_qnt_50    0.047011036
## + PH_qnt_95          0.046244810
## + TEMP_AFTERN_qnt_95 0.046163260
## + PH_MEAN            0.042567761
## + PH_qnt_50          0.034548656
## + PH_qnt_5           0.028489278
## + DO_EVEN_MEAN       0.016855715
## + TEMP_AFTERN_qnt_50 0.012371010
## + TEMP_AFTERN_MEAN   0.007461264
## + TEMP_AFTERN_qnt_5  0.004381808
## <none>               0.000000000
## 
##                  Df    AIC      F Pr(>F)   
## + SALINITY_qnt_5  1 4828.7 42.672  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.04797186 
## Call: allelecount ~ SALINITY_qnt_5 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + PH_qnt_95           0.06484362
## + DO_EVEN_MEAN        0.05865220
## + PH_MEAN             0.05833784
## + PH_qnt_5            0.05532689
## + PH_qnt_50           0.05450967
## + TEMP_AFTERN_MEAN    0.05289066
## + TEMP_AFTERN_qnt_5   0.05278021
## + SALINITY_qnt_50     0.05260988
## + TEMP_AFTERN_qnt_95  0.05137729
## + TEMP_AFTERN_qnt_50  0.05118021
## + SALINITY_MEAN       0.04959677
## + SALINITY_qnt_95     0.04910154
## <none>                0.04797186
## 
##             Df    AIC      F Pr(>F)   
## + PH_qnt_95  1 4814.9 15.902  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.06484362 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + SALINITY_qnt_95     0.06856765
## + TEMP_AFTERN_MEAN    0.06763894
## + SALINITY_MEAN       0.06759221
## + PH_qnt_5            0.06754970
## + TEMP_AFTERN_qnt_5   0.06747536
## + TEMP_AFTERN_qnt_50  0.06707204
## + DO_EVEN_MEAN        0.06694364
## + PH_MEAN             0.06683160
## + SALINITY_qnt_50     0.06682361
## + TEMP_AFTERN_qnt_95  0.06664108
## + PH_qnt_50           0.06645517
## <none>                0.06484362
## 
##                   Df    AIC      F Pr(>F)   
## + SALINITY_qnt_95  1 4812.6 4.2985  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.06856765 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + TEMP_AFTERN_MEAN    0.07242610
## + TEMP_AFTERN_qnt_5   0.07187588
## + PH_qnt_50           0.07166958
## + PH_MEAN             0.07130127
## + TEMP_AFTERN_qnt_50  0.07129398
## + PH_qnt_5            0.07107212
## + TEMP_AFTERN_qnt_95  0.07096014
## + DO_EVEN_MEAN        0.07077469
## + SALINITY_MEAN       0.07042134
## + SALINITY_qnt_50     0.06903948
## <none>                0.06856765
## 
##                    Df    AIC      F Pr(>F)   
## + TEMP_AFTERN_MEAN  1 4810.1 4.4276  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.0724261 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + DO_EVEN_MEAN        0.07900793
## + PH_qnt_5            0.07594580
## + SALINITY_qnt_50     0.07431980
## + PH_MEAN             0.07296529
## + PH_qnt_50           0.07278029
## + SALINITY_MEAN       0.07265161
## + TEMP_AFTERN_qnt_5   0.07260285
## + TEMP_AFTERN_qnt_50  0.07245540
## <none>                0.07242610
## + TEMP_AFTERN_qnt_95  0.07242516
## 
##                Df    AIC      F Pr(>F)   
## + DO_EVEN_MEAN  1 4805.2 6.8815  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.07900793 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + PH_qnt_5            0.08066151
## + PH_MEAN             0.08038765
## + PH_qnt_50           0.08004213
## + TEMP_AFTERN_qnt_5   0.07993525
## + SALINITY_qnt_50     0.07959671
## + TEMP_AFTERN_qnt_50  0.07953162
## + SALINITY_MEAN       0.07934320
## + TEMP_AFTERN_qnt_95  0.07921328
## <none>                0.07900793
## 
##            Df    AIC      F Pr(>F)   
## + PH_qnt_5  1 4804.7 2.4785  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.08066151 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + SALINITY_qnt_50     0.08101957
## + PH_MEAN             0.08101347
## + PH_qnt_50           0.08101061
## + SALINITY_MEAN       0.08098421
## + TEMP_AFTERN_qnt_5   0.08083069
## + TEMP_AFTERN_qnt_95  0.08079813
## + TEMP_AFTERN_qnt_50  0.08077911
## <none>                0.08066151
## 
##                   Df    AIC      F Pr(>F)  
## + SALINITY_qnt_50  1 4805.4 1.3199  0.076 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: R2.adj= 0.08101957 
## Call: allelecount ~ SALINITY_qnt_5 + PH_qnt_95 + SALINITY_qnt_95 +      TEMP_AFTERN_MEAN + DO_EVEN_MEAN + PH_qnt_5 + SALINITY_qnt_50 
##  
##                      R2.adjusted
## <All variables>       0.08117009
## + PH_qnt_50           0.08117009
## + TEMP_AFTERN_qnt_5   0.08117009
## + TEMP_AFTERN_qnt_95  0.08117009
## + TEMP_AFTERN_qnt_50  0.08117009
## + SALINITY_MEAN       0.08117009
## + PH_MEAN             0.08117009
## <none>                0.08101957
## 
##             Df    AIC      F Pr(>F)
## + PH_qnt_50  1 4806.3 1.1343  0.196

Step-wise selection of variables selects parameters SALINITY_qnt_5, PH_qnt_95, SALINITY_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and PH_qnt_50 as the best model. By contrast, forward selection of variables using R2 as the criterion results selects the combination of SALINITY_qnt_5, PH_qnt_95, SALINITY_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50 as best model.

Build model using parameters selected using both models PH_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50

RDA of progeny with environmental parameters by spawning event

Determine proportion of genetic variance explained by environmental parameters of outdoor grow-out ponds while controling for family effects by running partial RDA using allele counts as response variables, dummy variables indicating parentage as conditioning matrix, and environmental parameters (PH_qnt_95, TEMP_AFTERN_MEAN, DO_EVEN_MEAN, PH_qnt_5, and SALINITY_qnt_50; parameters in common with both selected best models) as constraining matrix.

## 
##  Summary of RDA results
## Call: rda(X = X, Y = Y, Z = Z, scale = FALSE)
## 
##                  Inertia Proportion Rank
## Total         357.275899   1.000000     
## Conditional   142.662440   0.399306   14
## Constrained     1.601150   0.004482    5
## Unconstrained 213.012310   0.596212  808
## Inertia is variance 
## Some constraints were aliased because they were collinear (redundant)
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5 
## 0.3879 0.3599 0.3383 0.2828 0.2322 
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8 
## 2.0453 1.8937 1.8458 1.8031 1.7425 1.6916 1.6507 1.6241 
## (Showing 8 of 808 unconstrained eigenvalues)

R2 (0.004) needs to be adjusted based on the number of predictors, as the number of explanatory variables inflates the apparent amount of explained variance due to random correlation. The adjusted R2 value is 0.001.

Test for significance using a permutation test to generate p-value indicating whether to reject null hypothesis (environmental data does not explain genetic variation). Rows of unconstrained matrix (genetic data set) are repeatedly randomized for n permutations. Relationship is considered significant if observed relationship is stronger than the randomly permuted relationships.

## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(X = X, Y = Y, Z = Z, scale = FALSE)
##           Df Variance      F Pr(>F)    
## Model      5    1.601 1.2147  0.001 ***
## Residual 808  213.012                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Proportion of variance explained by the environmental predictors is Proportion column of Constrained row in summary table (equivalent to R2 of multiple regression).

The eigenvalues for the constrained axes reflect the variance explained by each canonical axis; two constraining variables were aliased, leaving two RDA axis. The environmental vectors were fit unto the ordination to obtain projections for aliased and other terms.

## Importance of components:
##                         RDA1   RDA2   RDA3   RDA4   RDA5
## Eigenvalue            0.3879 0.3599 0.3383 0.2828 0.2322
## Proportion Explained  0.2423 0.2248 0.2113 0.1766 0.1450
## Cumulative Proportion 0.2423 0.4671 0.6783 0.8550 1.0000

Variance partitioning

Variance partitioning was used to compare the contribution of family and variation in the selected environmental parameters in structuring observed genomic variation. A full model, using family and environmental variables, a partial model, using family conditioned on environmental variables, and a partial model, using environmental data conditioned on family, were considered to partition the explainable variance into individual (family or environment) and shared components (family plus environment), using vegan. Significance of each component was tested using 1,000 permutations.

Partition variance to determine if geographic location or environmental data driving observed pattern(s).

Partition the explainable variance (i.e. total inertia for constrain matrix of full model): determine total, constrained, unconstrained values of inertia/proportion of total inertia (variance).

Perform RDA for all variables (spatial and environmental; full model).

## Call: rda(formula = allelecount ~ fam.mat + env.mat, scale = FALSE)
## 
##                Inertia Proportion Rank
## Total         357.2759     1.0000     
## Constrained   144.2636     0.4038   19
## Unconstrained 213.0123     0.5962  808
## Inertia is variance 
## Some constraints were aliased because they were collinear (redundant)
## 
## Eigenvalues for constrained axes:
##  RDA1  RDA2  RDA3  RDA4  RDA5  RDA6  RDA7  RDA8  RDA9 RDA10 RDA11 RDA12 RDA13 
## 39.28 28.22 16.02 14.43  8.99  8.17  6.99  6.37  4.78  3.79  2.30  1.99  0.88 
## RDA14 RDA15 RDA16 RDA17 RDA18 RDA19 
##  0.53  0.38  0.34  0.31  0.27  0.22 
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8 
## 2.0453 1.8937 1.8458 1.8031 1.7425 1.6916 1.6507 1.6241 
## (Showing 8 of 808 unconstrained eigenvalues)

Test for significance of overall model.

The overall model is significant (P = 0.001).

Partition the variation in genetic data into components accounted for by environmental and spatial variables and their combined effect using adjusted R squared to assess partitions explained by explanatory tables and their combinations.

## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = allelecount, X = ~fam.mat, ~env.mat)
## 
## Explanatory tables:
## X1:  ~fam.mat
## X2:  ~env.mat 
## 
## No. of explanatory tables: 2 
## Total variation (SS): 295467 
##             Variance: 357.28 
## No. of observations: 828 
## 
## Partition table:
##                      Df R.squared Adj.R.squared Testable
## [a+b] = X1           14   0.39931       0.38896     TRUE
## [b+c] = X2            5   0.08523       0.07967     TRUE
## [a+b+c] = X1+X2      19   0.40379       0.38977     TRUE
## Individual fractions                                    
## [a] = X1|X2          14                 0.31010     TRUE
## [b]                   0                 0.07886    FALSE
## [c] = X2|X1           5                 0.00081     TRUE
## [d] = Residuals                         0.61023    FALSE
## ---
## Use function 'rda' to test significance of fractions of interest

Extract fraction explained by each component:

Test significance of all components using permuted p-values.

Table 3: Variance partitioning of variance explained by family (fam), environmental variables (env), and shared effects due to correlation of family and env.

PARTITION VARIANCE PVAL
residuals 0.610 NA
fam+env+shared 0.390 0.001
fam+shared 0.389 0.001
fam 0.310 0.001
env+shared 0.080 0.001
shared 0.079 NA
env 0.001 0.001

Clustering of individuals

Compare clustering of individuals according to spawning event and family for component of genomic signal explained by differences in environmental conditions in the grow-out ponds after adjusting for family.

Fig 6a: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).

Fig 6a: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).

Compare distribution of progeny by within spawning events.

Fig 6b: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).

Fig 6b: pRDA biplot indicating environmental variables (arrows) and clustering of individuals by spawning event (shape) and sample point (fill).

Compare distribution of progeny by dam across spawning events and sample points.

Fig 6c: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per dam.

Fig 6c: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per dam.

Compare distribution of progeny by sire across spawning events and sample points.

Fig 6d: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per sire

Fig 6d: pRDA indicating clustering of individuals by spawning event (shape) and sample point (fill) per sire

Associated loci

Use the Mahalanobis distance to identify alleles with strongest associations to first two RDA axes (p < 0.001).

Table 4: Number of significantly associated loci (Mahalanobis distance >= 13.82)

MAH_DIST >= 13.82 n
TRUE 33

Fig 7a: Distribution of loci flagged as non-associated (grey) vs outlier (red) based on Mahalanobis distance threshold of 13.82 (p < 0.001.

Fig 7a: Distribution of loci flagged as non-associated (grey) vs outlier (red) based on Mahalanobis distance threshold of 13.82 (p < 0.001.

Compare distribution of flagged loci on genome scaffolds.

Fig 7b: Distribution of Mahalanobis distance per locus across the genome.

Fig 7b: Distribution of Mahalanobis distance per locus across the genome.

Table 5: Number of loci flagged as significantly correlated per scaffold.

SCAFFOLD p < 0.001 p < 0.01 p < 0.05 total_outlier
scf7180000009121 2 1 1 4
scf7180000009161 1 0 2 3
scf7180000009255 1 1 1 3
scf7180000009291 0 1 2 3
scf7180000009305 0 3 0 3
scf7180000009492 0 2 1 3
scf7180000009535 0 0 3 3
scf7180000009563 1 2 0 3
scf7180000012298 1 1 1 3
scf7180000012302 0 3 0 3
scf7180000012359 0 0 3 3
scf7180000012438 2 0 1 3
scf7180000009116 2 0 0 2
scf7180000009119 0 1 1 2
scf7180000009177 2 0 0 2
scf7180000009298 0 2 0 2
scf7180000009351 0 1 1 2
scf7180000009427 0 0 2 2
scf7180000009429 1 0 1 2
scf7180000009565 0 0 2 2
scf7180000009566 0 0 2 2
scf7180000009702 0 1 1 2
scf7180000009895 1 0 1 2
scf7180000010112 1 1 0 2
scf7180000012324 0 1 1 2
scf7180000012327 1 1 0 2
scf7180000012435 1 0 1 2
scf7180000012971 0 1 1 2

Compare clustering of flagged loci on scaffolds with four flagged loci.

Fig 7c: Distribution of Mahalanobis distance per locus for scaffolds with 4 flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Fig 7c: Distribution of Mahalanobis distance per locus for scaffolds with 4 flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Compare clustering of flagged loci on scaffolds with three flagged loci.

Fig 7d: Distribution of Mahalanobis distance per locus for scaffolds with three flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Fig 7d: Distribution of Mahalanobis distance per locus for scaffolds with three flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Compare clustering of flagged loci on scaffolds with two flagged loci.

Fig 7e: Distribution of Mahalanobis distance per locus for scaffolds with two flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Fig 7e: Distribution of Mahalanobis distance per locus for scaffolds with two flagged outlier loci. Loci above red dashed line are flagged as significantly associated.

Identity of genes potentially associated with environment (selection)

Compare significant outlier from the RDA to a preliminary draft of the red drum genome to identify genes potentially affected by selection. For each flagged locus, extract all genes with boundaries falling within 50kb using the annotated red drum scaffolds.

out_tbl <- read_tsv("results/pRDA_sign_loci_model3.rda") %>%
  extract(LOCUS, into = c("locus", "pos"), regex = "(.*_pilon)_(\\d+)")

# Make a list of the unique scaffolds
contigs_uniq <- out_tbl %>%
  pull(locus) %>%
  unique()

# Write to a file
contigs_uniq %>%
  write_lines("results/drum_contigs_uniq.txt")

Extract the relevant scaffolds from a gff files of the genome.


# get annotation information for the matchin scaffolds
zgrep -f results/drum_contigs_uniq.txt data/REF/pyu_contig3_snap2.maker.output_new.0.6.AED_annotated_blast.gff.gz > data/REF/drum_matching_scaffolds.gff

Import annotation data for scaffolds with flagged loci and identify genes withing 50kb of flagged loci.

anno_raw <- read_tsv("data/REF/drum_matching_scaffolds.gff", col_names = FALSE) %>%
  filter(X3 == "gene") %>%
  select(scaf = X1, start = X4, end = X5, id = X9) 

anno_mod <- anno_raw %>%
  extract(id, "gene_id", "ID=(Soce_v1_\\d+)|") %>%
  extract(scaf, "scaffold", "lcl\\|(scf.*)") %>%
  mutate(gene_id = paste(gene_id, "-RA", sep = ""))

out_genes <- out_tbl %>%
  mutate(pos = as.integer(pos)) %>% 
  mutate(left = pos - 50000, right = pos + 50000) %>%
  group_by(locus, pos) %>%
  group_split() %>%
  map(function(x) {
    
    #browser()
    genes <- anno_mod %>%
      filter(scaffold == x$locus) %>%
      filter((start > x$left & start < x$right) |  (end > x$left & end < x$right)) %>%
      mutate(out_pos = x$pos, out_mah = x$mah_dist)
    
    return(genes)
    
  }) %>%
  bind_rows() %>%
  distinct(gene_id, .keep_all = TRUE)

out_info <- out_genes %>%
  select(gene_id, out_pos, out_mah)

res_tbl <- read_tsv("data/REF/output.blastp", col_names = FALSE) %>%
  extract(X1, "gene_id", "^(Soce_v1_\\d+-RA)") %>%
  filter(gene_id %in% out_genes$gene_id) %>%
  left_join(out_genes, by = "gene_id") %>%
  select(gene_id, desc = X13, start, end, out_pos, out_mah, scaffold) %>%
  arrange(desc(out_mah), desc(start))

#filtered_gene_list <- res_tbl %>%
#  filter(out_pos >= start, out_pos <= end)

# Get the gene closest to each outlier
top_hits <- res_tbl %>%
  group_by(scaffold, out_pos) %>% 
  mutate(direction = case_when(out_pos < start ~ "left",
                               out_pos >= start & out_pos <= end ~ "within",
                               out_pos > end ~ "right"),
         distance = case_when(direction == "left" ~ start - out_pos,
                              direction == "within" ~ 0,
                              direction == "right" ~ out_pos - end)) %>%
  top_n(-1, distance) %>%
  select(scaffold, out_pos, desc)

summary_tbl <- res_tbl %>%
  group_by(scaffold, out_pos) %>%
  summarize(num_genes_100k = n(), mah_dist = out_mah[1]) %>%
  arrange(desc(mah_dist)) %>%
  left_join(top_hits, by = c("scaffold", "out_pos")) %>%
  rename(nearest_gene = desc)

# Note that the output has 34 rows for 33 outliers because one outlier has 2 genes that are equally distant

write_csv(res_tbl, "results/red_drum_env_genes_tbl.csv")

write_csv(summary_tbl, "results/red_drum_env_genes_summary_tbl.csv")

A total of 33 loci were flagged as significant in the RDA analysis were used to identify potential genes under selection in the red drum genome. One hundred fifty six genes were identified in 100kb windows flanking each outlier locus. A full list of the genes identified and a summary table including the outlier location.

Table 6: Number of genes with in 50kb of flagged locus and summary of gene nearest to each flagged locus.

scaffold out_pos num_genes_100k mah_dist nearest_gene
scf7180000010096_pilon 120999 6 66.10 PREDICTED: solute carrier family 12 member 9-like [Larimichthys crocea]
scf7180000009434_pilon 677210 5 61.82 PREDICTED: membrane-associated guanylate kinase, WW and PDZ domain-containing protein 3 [Stegastes partitus]
scf7180000009140_pilon 2779714 3 45.61 PREDICTED: methylated-DNA–protein-cysteine methyltransferase isoform X1 [Lates calcarifer]
scf7180000009116_pilon 6339640 3 34.49 PREDICTED: A disintegrin and metalloproteinase with thrombospondin motifs 7 [Larimichthys crocea]
scf7180000012438_pilon 897449 14 34.46 60S ribosomal protein L23a [Amphiprion ocellaris]
scf7180000012313_pilon 151064 10 33.55 methyl-CpG-binding domain protein 6-like [Seriola lalandi dorsalis]
scf7180000012436_pilon 143986 5 29.20 PREDICTED: uncharacterized protein LOC104931461 isoform X1 [Larimichthys crocea]
scf7180000009563_pilon 821390 4 28.48 PREDICTED: nuclear factor of activated T-cells, cytoplasmic 2 isoform X2 [Larimichthys crocea]
scf7180000013177_pilon 526531 3 26.29 PREDICTED: junctional adhesion molecule A-like [Larimichthys crocea]
scf7180000012327_pilon 949860 1 25.93 PREDICTED: zinc finger protein 236 isoform X2 [Larimichthys crocea]
scf7180000009121_pilon 14240 5 25.75 PREDICTED: collagen alpha-3(IX) chain [Larimichthys crocea]
scf7180000009895_pilon 473728 9 25.26 uncharacterized protein LOC111858574 isoform X1 [Paramormyrops kingsleyae]
scf7180000009418_pilon 148815 3 24.66 spectrin beta chain, non-erythrocytic 5 [Seriola dumerili]
scf7180000009631_pilon 839556 1 23.51 PREDICTED: slit homolog 3 protein, partial [Larimichthys crocea]
scf7180000012298_pilon 1836218 1 22.03 PREDICTED: neurexophilin-1 [Larimichthys crocea]
scf7180000013182_pilon 1081449 5 20.14 PREDICTED: rap1 GTPase-activating protein 1-like isoform X5 [Larimichthys crocea]
scf7180000009121_pilon 3257984 6 19.62 PREDICTED: gamma-glutamyltransferase 7 [Larimichthys crocea]
scf7180000009177_pilon 2488072 1 19.45 PREDICTED: collagen alpha-6(IV) chain [Larimichthys crocea]
scf7180000013165_pilon 666966 5 18.58 PREDICTED: GDNF family receptor alpha-2-like [Larimichthys crocea]
scf7180000009116_pilon 3039707 2 18.43 uncharacterized protein LOC110962793 [Acanthochromis polyacanthus]
scf7180000009599_pilon 147918 7 18.12 PREDICTED: mitochondrial Rho GTPase 1-A-like [Larimichthys crocea]
scf7180000012435_pilon 1196576 1 17.21 PREDICTED: glutamate receptor ionotropic, kainate 2-like [Notothenia coriiceps]
scf7180000009733_pilon 28954 3 17.10 PREDICTED: death domain-containing protein 1 [Larimichthys crocea]
scf7180000009177_pilon 1944675 7 15.83 uncharacterized protein LOC111661711 [Seriola lalandi dorsalis]
scf7180000012438_pilon 2323232 5 15.54 PREDICTED: protein YIPF5 [Haplochromis burtoni]
scf7180000009161_pilon 5008415 12 15.41 PREDICTED: transmembrane protein 100-like [Larimichthys crocea]
scf7180000009598_pilon 1739088 5 15.25 PREDICTED: microtubule-associated protein RP/EB family member 2 isoform X1 [Larimichthys crocea]
scf7180000010112_pilon 10613 3 15.13 PREDICTED: uncharacterized protein LOC106013965 [Aplysia californica]
scf7180000009115_pilon 4098131 3 14.48 PREDICTED: tyrosine-protein kinase SgK223 [Larimichthys crocea]
scf7180000009255_pilon 854075 8 14.45 PREDICTED: protein DENND6A [Larimichthys crocea]
scf7180000009429_pilon 429853 3 14.42 mitotic-spindle organizing protein 2-like isoform X1 [Labrus bergylta]
scf7180000009429_pilon 429853 3 14.42 PREDICTED: DNA-directed DNA/RNA polymerase mu [Larimichthys crocea]
scf7180000009279_pilon 1366461 6 14.24 PREDICTED: probable G-protein coupled receptor 153 isoform X1 [Larimichthys crocea]
scf7180000009860_pilon 1038811 1 14.16 PREDICTED: uncharacterized protein LOC106526999 [Austrofundulus limnaeus]

Top genes candidates (ranked; mahalanobis distance > 30)

References

Forester, Brenna R., Matthew R. Jones, Stéphane Joost, Erin L. Landguth, and Jesse R. Lasky. 2016. “Detecting spatial genetic signatures of local adaptation in heterogeneous landscapes.” Molecular Ecology 25 (1): 104–20. https://doi.org/10.1111/mec.13476.

Forester, Brenna R., Jesse R. Lasky, Helene H. Wagner, and Dean L. Urban. 2018. “Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations.” Molecular Ecology 27 (9): 2215–33. https://doi.org/10.1111/mec.14584.

Laget, Sophie, Michael Joulie, Florent Le Masson, Nobuhiro Sasai, Elisabeth Christians, Sriharsa Pradhan, Richard J Roberts, and Pierre Antoine Defossez. 2010. “The human proteins MBD5 and MBD6 associate with heterochromatin but they do not bind methylated DNA.” PLoS ONE 5 (8): e11982. https://doi.org/10.1371/journal.pone.0011982.

Liu, Chuan-Ju, Wei Kong, Kiril Ilalov, Shuang Yu, Ke Xu, Lisa Prazak, Marc Fajardo, et al. 2006. “ADAMTS‐7: a metalloproteinase that directly binds to and degrades cartilage oligomeric matrix protein.” The FASEB Journal 20 (7): 988–90. https://doi.org/10.1096/fj.05-3877fje.

Oksanen, Jari, Roeland Kindt, Pierre Legendre, Bob O Hara, Gavin L Simpson, Petery Solymos, M Henry H Stevens, and Helene Wagner. 2010. “vegan: Community Ecology Package. R package version 1.17-3.” October 10 (01): 2008. http:/.

Verri, Tiziano, Genciana Terova, Alessandro Romano, Amilcare Barca, Paola Pisani, Carlo Storelli, and Marco Saroglia. 2012. “The SoLute Carrier (SLC) Family Series in Teleost Fish.” In Functional Genomics in Aquaculture, 219–320. Oxford, UK: Wiley-Blackwell. https://doi.org/10.1002/9781118350041.ch10.

Walter, Ronald B., Huang Mo Sung, Gabe W Intano, and Christi A Walter. 2001. “Characterization of O6-methylguanine-DNA-methyltransferase (O6-MGMT) activity in Xiphophorus fishes.” Mutation Research - Genetic Toxicology and Environmental Mutagenesis 493 (1-2): 11–22. https://doi.org/10.1016/S1383-5718(01)00169-3.

Wright, Gavin J, Jonathan D Leslie, Linda Ariza-McNaughton, and Julian Lewis. 2004. “Delta proteins and MAGI proteins: An interaction of Notch ligands with intracellular scaffolding molecules and its significance for zebrafish development.” Development 131 (22): 5659–69. https://doi.org/10.1242/dev.01417.